-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UInt128 improve division by 64 bit on x64 #99747
UInt128 improve division by 64 bit on x64 #99747
Conversation
Tagging subscribers to this area: @dotnet/area-system-numerics |
6170e5e
to
f236089
Compare
This has build failures that need to be addressed. Looks to be warning due to the usage of an API marked preview. |
2aa355d
to
60cc066
Compare
@tannergooding I added suppression for "CA2252 // This API requires opting into preview features" in source. I assume there is a good reason why preview features are no longer allowed System.Private.CoreLib when it was previously allowed for "dogfooding preview features", so looking at enabling it for the project seemed to be out of question. Is there a better way to solve it available to corelib / the dotnet runtime repro? |
This API is marked with Preview attribute because it produces suboptimal code. Have you verified that this change actually improves performance? In general, any change that is a performance optimization needs to come with perf numbers. |
It is not optimal, but it is quite good As I understand it the major issue which prevents general usage in Math.DivRem is that constant detection (and maybe propagation) is not yet done. The generated code is can often be optimal, unless you use to many value tuple calls and it starts to spill to the stack. As for preview
Yes, see Uint128 division benchmarks below, the it should give a much better confidence than my original statement about it being at least 10ns faster. (it is almost 20ns or 9 times faster) This PRcommit 60cc066 (I built the benchmarks as self contained and then copied over all dll's from (runtime\artifacts\bin\coreclr\windows.x64.Release)
NET 9 preview 4
Source code```// See https://aka.ms/new-console-template for more information using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running;Console.WriteLine("Hello, World!"); // Run benchmakrs using benchmarks switcher ///
|
I have done quick perf tests:
It runs 1.3x slower with this change. Would it be better to apply this optimization only for non-zero higher bits? |
…s known that upper is zero such as division by constant)
I've changed the order so that the new code is only used as a fallback in case of non-zero higher bits. My best guess is that the performance change comes from PGO based aggressive inlining in which case the standard division can get replaced with mul and shift, but the instrinct will not. |
Rerunning CI. Going to finish reviewing this today after CI finishes |
@tannergooding I have updated the suppressions for this PR and the decimal PR to handle the introduction of ExperimentalAttribute, so it compiles again. |
This looks good. But could you give some updated perf numbers prior to this getting merged? |
@tannergooding I reran above benchmarks with rc1 and the last commit of this branc PR
RC.1
your benchmark program
|
Adds a similar fast path as for decimal
#99212 for when divisor fits in 64 bit.
It should inprove performance quite a bit if the UInt128 is larger than 64bit.
I did not do any benchmarks but based on similar test for 96 by 32 division the new code should be at least 10ns faster.
Question:
That should allow specialized faster code for common scenarios without adding complicated code to the JIT to detect and handle it without lots of runtime checks for 64/32bit values.
Note: It might be possible to further improve performance for non x64 by calling into decimal`s Div96By32 or Div128By64 if divisor is lower than than the top 64 byte for all other platforms, but it would require measuring the performance.